Optimizing the Knowledge Discovery Process through Semantic Meta-Mining
نویسنده
چکیده
I will describe a novel meta-learning approach to optimizing the knowledge discovery or data mining (DM) process. This approach has three features that distinguish it from its predecessors. First, previous meta-learning research has focused exclusively on improving the learning phase of the DM process. More specifically, the goal of meta-learning has typically been to select the most appropriate algorithm and/or parameter settings for a given learning task. We adopt a more process-oriented approach whereby meta-learning is applied to design choices at different stages of the complete data mining process or workflow (hence the term meta-mining). Second, meta-learning for algorithm or model selection has consisted mainly in mapping dataset properties to the observed performance of algorithms viewed as black boxes. While several generations of researchers have worked intensively on characterizing datasets, little has been done to understand the internal mechanisms of the algorithms used. At best, a few have considered perceptible features of algorithms like their ease of implementation or their robustness to noise, or the interpretability of the models they produce. In contrast, our meta-learning approach complements dataset descriptions with an in-depth analysis and characterization of algorithms their underlying assumptions, optimization goals and strategies, together with the structure and complexity of the models and patterns they generate. Third, previous meta-learning approaches have been strictly (meta) data-driven. To make sense of the intricate relationships between tasks, data and algorithms at different stages of the data mining process, our meta-miner relies on extensive background knowledge concerning knowledge discovery itself. For this reason we have developed a data mining ontology, which defines the essential concepts and relations needed to represent and analyse data mining objects and processes. In addition, a DM knowledge base gathers assertions concerning data preprocessing and machine learning algorithms as well as their implementations in several open-source software packages. The DM ontology and knowledge base are domain-independent; they can be exploited in any application area to build databases describing domain-specific data analysis tasks, datasets and experiments. Aside from their direct utility in their respective target domains, such databases are the indispensable source of training and evaluation data for the meta-miner. These three features together lay the groundwork for semantic meta-mining, the process of mining DM meta-data on the basis of data mining expertise distilled in an ontology and knowledge base.
منابع مشابه
The 2nd International Workshop on Inductive Reasoning and Machine Learning for the Semantic Web Proceedings
I will describe a novel meta-learning approach to optimizingthe knowledge discovery or data mining (DM) process. This approach hasthree features that distinguish it from its predecessors. First, previousmeta-learning research has focused exclusively on improving the learningphase of the DM process. More specifically, the goal of meta-learning hastypically been to select the ...
متن کاملVisualization and Database Support for Geographic Meta-Mining
Introduction Geographic data mining can be defined as a set of exploratory computational and statistical approaches for analyzing very large spatial and spatiotemporal data sets. Data mining techniques are often grouped into categories that include clustering, categorization, summarization, rule-mining, and feature extraction. All of these types of techniques are generally oriented towards iden...
متن کاملPattern Based Feature Construction in Semantic Data Mining
The authors propose a new method for mining sets of patterns for classification, where patterns are represented as SPARQL queries over RDFS. The method contributes to so-called semantic data mining, a data mining approach where domain ontologies are used as background knowledge, and where the new challenge is to mine knowledge encoded in domain ontologies, rather than only purely empirical data...
متن کاملPreface to Third International Workshop on Semantic Aspects in Data Mining (SADM’10)
The SADM Workshop addresses the issue of explicitly consider data semantics, background knowledge, and reasoning in Data Mining and Knowledge Discovery. With the increasing complexity of the form of data such as textual, biological, spatio-temporal there is an increasing need for enhancing knowledge discovery process with semantic information. However, the semantics of the data, as well as the ...
متن کاملSemantic Subgroup Discovery Systems and Workflows in the SDM-Toolkit
This paper addresses semantic data mining, a new data mining paradigm in which ontologies are exploited in the process of data mining and knowledge discovery. This paradigm is introduced together with new semantic subgroup discovery systems SDM-search for enriched gene sets (SEGS) and SDM-Aleph. These systems are made publicly available in the new SDM-Toolkit for semantic data mining. The toolk...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010